A perceptual study of acceleration parameters in HMM-based TTS

نویسندگان

Yining Chen

Zhi-Jie Yan

Frank K. Soong

چکیده

In HMM-based TTS, statistical models of static, velocity (delta), and acceleration (delta-delta) parameters are jointly trained in a unified, ML-based framework. Previous study has shown that the acceleration parameters are able to generate smoother trajectory with less distortions, but the effect has never been investigated in formal objective and subjective tests. In this paper, the effect of the acceleration parameters, in addition to their static and velocity counterparts, in trajectory generation is studied in depth. We show that discarding acceleration parameters only introduces small additional distortion compared to the reference generated with full model parameters. But human subjects can easily perceive the voice quality degradation, because saw-tooth-like trajectories are commonly generated. Several methods to alleviate the discontinuity are discussed, and we choose the upperand lower-bounded envelopes of the saw-tooth trajectories for further analysis. Experimental results show that both envelope trajectories have larger objective distortions than the saw-tooth ones. However, the speech synthesized using the envelope trajectory becomes perceptually transparent to the reference. This study, in addition to its subjective and objective significance in measuring the distortion of the synthesized speech, facilitates efficient implementation of low-cost TTS systems, as well as low bit rate speech coding and reconstruction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

XIMERA: a new TTS from ATR based on corpus-based technologies

This paper describes a new concatenative TTS system under development at ATR. The system, named XIMERA, is based on corpus-based technologies, as was the case for the preceding TTS systems from ATR, namely ν-talk and CHATR. The prominent features of XIMERA are (1) large corpora (a 110hours corpus of a Japanese male, a 60-hours corpus of a Japanese female, and a 20-hours corpus of a Chinese fema...

متن کامل

HMM-based TTS for hanoi vietnamese: issues in design and evaluation

This paper presents the development and evaluation of an HMM-based TTS system for the modern Hanoi dialect of Northern Vietnamese, a tonal language. A study of specific phonetic and prosodic features of Hanoi Vietnamese is discussed. Consequences on the design of an HMM-based TTS system are derived. Using this knowledge, a TTS system, called VTed, is then developed under the Mary TTS platform. ...

متن کامل

Advances in Spectral Parameterization for Statistical (HMM-Based) TTS

HMM-based parametric speech synthesis has recently become an alternative to the concatenative TTS approach, especially when low footprint and general speech domain are required. A majority of speech parameterization models used in state-ofthe art HMM TTS systems employ source-filter waveform synthesis schemes. Sinusoidal representation and waveform generation of speech is an alternative to the ...

متن کامل

Linguistic and mixed excitation improvements on a HMM-based speech synthesis for Castilian Spanish

Hidden Markov Models based text-to-speech (HMM-TTS) synthesis is one of the techniques for generating speech from trained statistical models where spectrum and prosody of basic speech units are modelled altogether. This paper presents the advances in our Spanish HMM-TTS and a perceptual test is conducted to compare it with an extended PSOLA-based concatenative (E-PSOLA) system. The improvements...

متن کامل

Syllable HMM based Mandarin TTS and comparison with concatenative TTS

This paper introduces a Syllable HMM based Mandarin TTS system. 10-state left-to-right HMMs are used to model each syllable. We leverage the corpus and the front end of a concatenative TTS system to build the Syllable HMM based TTS system. Furthermore, we utilize the unique consonant/vowel structure of Mandarin syllable to improve the voiced/unvoiced decision of HMM states. Evaluation results s...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

A perceptual study of acceleration parameters in HMM-based TTS

نویسندگان

چکیده

منابع مشابه

XIMERA: a new TTS from ATR based on corpus-based technologies

HMM-based TTS for hanoi vietnamese: issues in design and evaluation

Advances in Spectral Parameterization for Statistical (HMM-Based) TTS

Linguistic and mixed excitation improvements on a HMM-based speech synthesis for Castilian Spanish

Syllable HMM based Mandarin TTS and comparison with concatenative TTS

عنوان ژورنال:

اشتراک گذاری